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CLAIMS 

What is claimed is: 

1. A method comprising: 

building a control flow graph (CFG) for a loop body of a sequential application 
program to form a CFG loop; 

updating nodes of the CFG loop to enclose identified critical sections of the 
sequential application program within pairs of boundary instructions; and 

modifying nodes of the CFG loop to reduce an amount of instructions between 
corresponding pairs of bonding instructions to form a modified CFG loop. 

2. The method of claim 1, wherein updating the CFG loop comprises: 
selecting an identified critical section of the sequential application program; 
inserting an await instruction within a top node of the CFG loop; 
inserting an advance instruction within a bottom node of the CFG loop; and 
repeating the selecting, inserting and inserting for each identified critical section 

of the sequential application program. 

3. The method of claim 1, wherein modifying nodes of the CFG loop 
comprises: 

hoisting identified motion candidate instructions within the nodes of the CFG loop 
using code motions with fixed await boundary instructions; 

sinking identified motion candidate instructions within the nodes of the CFG loop 
using code motion with fixed advance instructions; and 

hoisting identified motion candidate instructions within the nodes of the CFG loop 
with fixed await instructions and fixed advance instructions. 

4. The method of claim 3, wherein hoisting detected hoist instructions with 
fixed await instructions comprises: 

identifying every instruction within a basic block the CFG loop, excluding await 
instructions, as a motion candidate instructions; 

building an inverse graph of the CFG loop; 
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initializing a hoist queue with the basic blocks from the CFG loop, the basic 
blocks ordered according to a topological order indicated by the inverse graph; 

hoisting motion candidate instructions of the basic blocks until hoist instructions 
are no longer detected from the motion candidate instructions; and 

hoisting detected hoist instructions from motion candidate instructions in a source 
basic block of the CFG loop according to a dependence graph of the sequential 
application program. 

5. The method of claim 4, wherein hoisting detected hoist instructions from 
the motion candidate instructions of the basic blocks comprises: 

de-queuing a basic block from the hoist queue as a current block; 

computing hoist instructions from the motion candidate instructions of the basic 
blocks based on a dependence graph of the sequential application program; 

hoisting the computed hoist instructions into a corresponding basic block; and 

enqueuing the current block's predecessors from the CFG loop into the hoist 
queue when a change is detected. 

6. The method of claim 4, wherein sinking detected sink instructions with 
fixed advance instructions comprises: 

identifying motion candidate instructions within the basic blocks of the CFG loop 
through dataflow analysis with fixed advance instructions; 

initializing a sink queue with the basic blocks ordered based on a topological 
order in the CFG loop; 

sinking detected sink instructions among the basic blocks until sinking 
instructions are no longer detected; and 

sinking detected motion candidates within basic blocks that contain advance 
instructions according to the dependence graph. 

7. The method of claim 6, wherein sinking detected sink instructions among 
the basis blocks comprises: 

de-queue a basic block from the sink queue as a current block; 
computing sink instructions from motion candidate instructions based on a 
dependence graph of the sequential application program; 
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sinking computed sink instructions into a corresponding basic block; and 
en-queuing a current block's successors in the CFG loop into the sink queue if a 
change is detected. 

8. The method of claim 3, wherein performing instruction hoisting with both 
the await instructions and advance instructions fixed, comprises: 

initializing a hoist queue with the basic blocks ordered based on a topological 
order in the CFG loop; 

identifying motion candidate instructions within the basic blocks of the CFG loop 
through dataflow analysis with fixed advance instructions and fixed await instructions; 

hoisting detected hoist instructions among the basic blocks until hoist instructions 
are no longer detected; and 

hoist motion candidates within basic blocks that contain await instructions based 
on a dependence graph of the sequential application program. 

9. The method of claim 8, wherein motion candidate instructions hoisted out 
of an outmost await instruction are no longer treated as motion candidates; and 

wherein motion candidate instructions out of an outmost advance instruction are 
no longer treated as motion candidates. 

10. The method of claim 1, further comprising: 

forming a plurality of application program thread partitions from the modified 
CFG loop; and 

concurrently executing the plurality of application program threads within a 
respective thread of a multi-threaded architecture. 

11. An article of manufacture including a machine readable medium having 
stored thereon instructions which may be used to program a system to perform a method, 
comprising: 

building a control flow graph (CFG) for a loop body of a sequential application 
program to form a CFG loop; 

updating nodes of the CFG loop to enclose identified critical sections of the 
sequential application program within pairs of boundary instructions; and 
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modifying nodes of the CFG loop to reduce an amount of instructions between 
corresponding pairs of bonding instructions to form a modified CFG loop. 

12. The article of manufacture of claim 11, updating the CFG loop comprises: 
selecting an identified critical section of the sequential application program; 
inserting an await instruction within a top node of the CFG loop; 

inserting an advance instruction within a bottom node of the CFG loop; and 
repeating the selecting, inserting and inserting for each identified critical section 
of the sequential application program. 

13. The article of manufacture of claim 11, wherein modifying nodes of the 
CFG loop comprises: 

hoisting identified motion candidate instructions within the nodes of the CFG loop 
using code motions with fixed await boundary instructions; 

sinking identified motion candidate instructions within the nodes of the CFG loop 
using code motion with fixed advance instructions; and 

hoisting identified motion candidate instructions within the nodes of the CFG loop 
with fixed await instructions and fixed advance instructions. 

14. The article of manufacture of claim 13, wherein hoisting detected hoist 
instructions with fixed await instructions comprises: 

identifying every instruction within a basic block the CFG loop, excluding await 
instructions, as a motion candidate instructions; 

building an inverse graph of the CFG loop; 

initializing a hoist queue with the basic blocks from the CFG loop, the basic 
blocks ordered according to a topological order indicated by the inverse graph; 

hoisting motion candidate instructions of the basic blocks until hoist instructions 
are no longer detected from the motion candidate instructions; and 

hoisting detected hoist instructions from motion candidate instructions in a source 
basic block of the CFG loop according to a dependence graph of the sequential 
application program. 
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15. The article of manufacture of claim 14, wherein hoisting detected hoist 
instructions from the motion candidate instructions of the basic blocks comprises: 

de-queuing a basic block from the hoist queue as a current block; 

computing hoist instructions from the motion candidate instructions of the basic 
blocks based on a dependence graph of the sequential application program; 

hoisting the computed hoist instructions into a corresponding basic block; and 

enqueuing the current block's predecessors from the CFG loop into the hoist 
queue when a change is detected. 

16. The article of manufacture of claim 14, sinking detected sink instructions 
with fixed advance instructions comprises: 

identifying motion candidate instructions within the basic blocks of the CFG loop 
through dataflow analysis with fixed advance instructions; 

initializing a sink queue with the basic blocks ordered based on a topological 
order in the CFG loop; 

sinking detected sink instructions among the basic blocks until sinking 
instructions are no longer detected; and 

sinking detected motion candidates within basic blocks that contain advance 
instructions according to the dependence graph. 

17. The article of manufacture of claim 16, wherein sinking detected sink 
instructions among the basis blocks comprises: 

de-queue a basic block from the sink queue as a current block; 

computing sink instructions from motion candidate instructions based on a 
dependence graph of the sequential application program; 

sinking computed sink instructions into a corresponding basic block; and 

en-queuing a current block's successors in the CFG loop into the sink queue if a 
change is detected. 



18. The article of manufacture of claim 13, wherein performing instruction 
hoisting with both the await instructions and advance instructions fixed, comprises: 

initializing a hoist queue with the basic blocks ordered based on a topological 
order in the CFG loop; 
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identifying motion candidate instructions within the basic blocks of the CFG loop 
through dataflow analysis with fixed advance instructions and fixed await instructions; 

hoisting detected hoist instructions among the basic blocks until hoist instructions 
are no longer detected; and 

hoist motion candidates within basic blocks that contain await instructions based 
on a dependence graph of the sequential application program. 

19. The article of manufacture of claim 18, wherein motion candidate 
instructions hoisted out of an outmost await instruction are no longer treated as motion 
candidates; and 

wherein motion candidate instructions out of an outmost advance instruction are 
no longer treated as motion candidates. 

20. The article of manufacture of claim 11, further comprising: 
forming a plurality of application program thread partitions from the modified 

CFG loop; and 

concurrently executing the plurality of application program threads within a 
respective thread of a multi-threaded architecture. 

21. A method comprising: 

partitioning a sequential application program into a plurality of application 
program threads; and 

concurrently executing the plurality of application program threads within a 
respective thread of a multi-threaded architecture. 

22. The method of claim 21, wherein partitioning the sequential application 
program comprises: 

determining a thread count of a multi-threaded architecture; 

receiving identified critical sections within the sequential application program; 

and 

generating a plurality of application program threads according to the thread count 
to synchronize access to identified critical sections among the plurality of application 
program threads. 
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23. The method of claim 21, wherein concurrently executing further 
comprises: 

executing each iteration of a thread program loop by distinct multi-threaded 
architecture; and 

executing critical sections of the thread program loop sequential thread order. 

24. The method of claim 22, wherein generating the application program 
threads comprises: 

processing identified critical sections to reduce an amount of code contained 
within critical sections of thread program loops. 

25. The method of claim 24, wherein code motion is used to reduce the 
amount of code contained within critical sections of the thread program loops. 

26. An article of manufacture including a machine readable medium having 
stored thereon instructions which may be used to program a system to perform a method, 
comprising: 

partitioning a sequential application program into a plurality of application 
program threads; and 

concurrently executing the plurality of application program threads within a 
respective thread of a multi-threaded architecture. 

27. The article of manufacture of claim 26, wherein partitioning the sequential 
application program comprises: 

determining a thread count of a multi-threaded architecture; 

receiving identified critical sections within the sequential application program; 

and 

generating a plurality of application program threads according to the thread count 
to synchronize access to identified critical sections among the plurality of application 
program threads. 
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28. The article of manufacture of claim 26, wherein concurrently executing 
further comprises: 

executing each iteration of a thread program loop by distinct multi-threaded 
architecture; and 

executing critical sections of the thread program loop sequential thread order. 

29. The article of manufacture of claim 27, wherein generating the application 
program threads comprises: 

processing identified critical sections to reduce an amount of code contained 
within critical sections of thread program loops. 

30. The article of manufacture of claim 29, wherein code motion is used to 
reduce the amount of code contained within critical sections of the thread program loops. 

31. An apparatus, comprising: 
a processor; 

a memory coupled to the processor, the memory including a compiler to cause a 
partition a sequential application program into a plurality of application program-threads 
to enable concurrent execution of the plurality of program-threads within a respective 
thread of a multi-threaded architecture. 

32. The apparatus of claim 31, wherein the compiler to cause building a 
control flow graph (CFG) for a loop body of a sequential application program to form a 
CFG loop to cause an update of nodes of the CFG loop to enclose identified critical 
sections of the sequential application program within pairs of boundary instructions and 
to cause modification of nodes of the CFG loop to reduce an amount of instructions 
between corresponding pairs of bonding instructions to form a modified CFG loop. 

33. The apparatus of claim 32, wherein the compiler to cause hoisting 
identified motion candidate instructions within the nodes of the CFG loop using code 
motions with fixed await boundary instructions, to cause sinking of identified motion 
candidate instructions within the nodes of the CFG loop using code motion with fixed 
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advance instructions and to cause hoisting of motion candidate instructions within the 
nodes of the CFG loop with fixed await instructions and fixed advance instructions. 

34. A system comprising: 
a processor; 

a memory controller coupled to the processor; and 

a DDR SRAM memory coupled to the processor, the memory including a 
compiler to cause partitioning a sequential application program into a plurality of 
application program threads to enable concurrent execution of the plurality of application 
program threads within a respective thread of a multi-threaded architecture 

35. The system of claim 34, wherein the compiler to cause building a control 
flow graph (CFG) for a loop body of a sequential application program to form a CFG 
loop to cause an update of nodes of the CFG loop to enclose identified critical sections of 
the sequential application program within pairs of boundary instructions and to cause 
modification of nodes of the CFG loop to reduce an amount of instructions between 
corresponding pairs of bonding instructions to form a modified CFG loop. 

36. The system of claim 35, wherein the compiler to cause hoisting identified 
motion candidate instructions within the nodes of the CFG loop using code motions with 
fixed await boundary instructions, to cause sinking of identified motion candidate 
instructions within the nodes of the CFG loop using code motion with fixed advance 
instructions and to cause hoisting of motion candidate instructions within the nodes of the 
CFG loop with fixed await instructions and fixed advance instructions. 
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